Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots: return errors in json format #9146

Merged
merged 5 commits into from
Mar 27, 2023

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Mar 9, 2023

This PR changes the schema to:

{
  "errors": [
    {
      "name": "<plot_name>",
      "rev": "<rev>",
      "source": "<source_file>",
      "type": "<Type>",
      "msg": "<msg>",
    },
 ],
 "data": {}
}

The existing data is kept inside "data" and the new "errors" key is introduced where errors are a list of errors.

Closes #9025.
Closes #7692.

@codecov
Copy link

codecov bot commented Mar 9, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.23 ⚠️

Comparison is base (d41512c) 92.85% compared to head (864b68d) 92.62%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9146      +/-   ##
==========================================
- Coverage   92.85%   92.62%   -0.23%     
==========================================
  Files         458      458              
  Lines       37051    37103      +52     
  Branches     5344     5349       +5     
==========================================
- Hits        34404    34367      -37     
- Misses       2119     2188      +69     
- Partials      528      548      +20     
Impacted Files Coverage Δ
dvc/commands/plots.py 89.41% <100.00%> (+0.52%) ⬆️
dvc/render/match.py 96.34% <100.00%> (-2.19%) ⬇️
dvc/repo/plots/__init__.py 91.84% <100.00%> (ø)
tests/func/plots/test_show.py 100.00% <100.00%> (ø)
tests/integration/plots/test_plots.py 94.47% <100.00%> (+0.38%) ⬆️
tests/unit/command/test_plots.py 100.00% <100.00%> (ø)
tests/unit/render/test_match.py 100.00% <100.00%> (ø)

... and 22 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@skshetry skshetry marked this pull request as draft March 9, 2023 15:26
@dberenbaum
Copy link
Collaborator

@mattseddon PTAL if this gives you what you need to show errors in VS Code.

@shcheklein
Copy link
Member

@skshetry please add the description (if you feel it's ready for Matt, me, and other folks to participate - asking only because Dave pinged us), examples, docs, etc. It's hard to review it this way from the product perspective.

@skshetry skshetry marked this pull request as ready for review March 10, 2023 03:19
@skshetry
Copy link
Member Author

@shcheklein, I added a description above. Let me know if you need anything more. 🙂

@skshetry skshetry added breaking-change A: cli Related to the CLI A: plots Related to the plots and removed A: cli Related to the CLI labels Mar 13, 2023
@mattseddon
Copy link
Member

For context/background:

These are the areas in the UI that we are planning to show errors

image

  • 1 will show any errors for the selected revisions within the plot.
  • 2 will contain any errors for the revision.
  • 3 will be specifically related to an image missing for a revision. This is a special case where it might not actually be an error depending on the type of project.

Based on the above I think that in the extension we'll probably hold the list of errors in the following format:

{
  path: string // i.e title, e.g dvc.yaml::Accuracy
  revision: string
  sourceFile: string
  type: string
  msg: string
}[]

From the above structure, we'll be able to easily group/consolidate errors to fit them into each of the three places. The structure is also fairly extensible so we should avoid getting trapped further down the line. We will very likely end up displaying concatenated msgs (unless msg is blank and then we'll need to fill the gap). E.g in 1 find all of the entries in the list with path == id and concatenate <revision>: <msg> into a newline delimited string to display in a tooltip.

If you could provide a top-level error key that would work the best as it would involve the least amount of parsing on the extension side but as I'll be creating a list of dicts I'd propose at least changing the schema to:

{"errors": {
    "<revision>": list({
        "type": "<exception_type>",
        "msg": "<message>"
      })
  }
}

If this doesn't work for other clients of --json then I'd be happy to continue parsing on the extension side.

Hope that helps. LMK what you think.

@skshetry
Copy link
Member Author

{"errors": {
    "<revision>": list({
        "type": "<exception_type>",
        "msg": "<message>"
      })
  }
}

With this format, how will you figure out which image and what revision failed? Or, are you going to consider lack of data as failed image plot?

@mattseddon
Copy link
Member

{"errors": {
    "<revision>": list({
        "type": "<exception_type>",
        "msg": "<message>"
      })
  }
}

With this format, how will you figure out which image and what revision failed? Or, are you going to consider lack of data as failed image plot?

Sorry, I wasn't clear. That would only work if it is under the plot key e.g

{
  "dvc.yaml::not-existing": [
    {
      "errors": {
        "<revision>": list({
          "type": "<exception_type>",
          "msg": "<message>"
        })
      }
      ...

A top-level error key would need to contain all of the information for each error. I.e

{
  path: string // i.e title, e.g dvc.yaml::Accuracy
  revision: string
  sourceFile: string
  type: string
  msg: string
}[]

That would give enough to work out which images are missing or am I missing something?

@skshetry
Copy link
Member Author

That would give enough to work out which images are missing or am I missing something?

That should work. Note that there may not always be source_file, as not all errors are related to sources. (Also let's rename sourceFile to just source?)

@skshetry
Copy link
Member Author

skshetry commented Mar 14, 2023

@mattseddon, should we move the plots' data to "plots" or something?

Note that json is only used by vscode (and is hidden for CLI users). So the format is more or less up to you.

@skshetry
Copy link
Member Author

skshetry commented Mar 14, 2023

I have changed the format for errors to be a top-level key, and the format is:

{
  name: string // i.e title, e.g dvc.yaml::Accuracy
  revision: string
  source?: string
  type: string
  msg: string
}[]

so instead of path, it's name, and source instead of sourceFile.

I'll suggest moving plots data inside plots or data key too, as there could be plot with "errors" path.

touch errors
dvc plots show --json errors
{
  "errors": [],
}

@mattseddon
Copy link
Member

I have changed the format for errors to be a top-level key, and the format is:

{
  name: string // i.e title, e.g dvc.yaml::Accuracy
  revision: string
  source?: string
  type: string
  msg: string
}[]

so instead of path, it's name, and source instead of sourceFile.

I'll suggest moving plots data inside plots or data key too, as there could be plot with "errors" path.

touch errors
dvc plots show --json errors
{
  "errors": [],
}

Yep, we can use data 👍🏻.

@skshetry
Copy link
Member Author

skshetry commented Mar 14, 2023

@mattseddon, please take a look at this PR. Should be ready then.

It introduces top-level data and errors key.

Copy link
Member

@mattseddon mattseddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change/API LGTM. The next steps for me are to:

  1. Accomodate the breaking change of the new data key.
  2. Build out the errors functionality to make sure nothing is missing from the API (unlikely).

As long as 1 is ready for when this is merged/released then everything should be fine. Up to you on whether or not you want to merge/potentially iterate or wait for me to work through steps 1 & 2 beforehand 🙏🏻.

Just give me a heads-up if this is going to get released.

One question that I have:

Will it still be possible to make plots diff fail with an error or will it always return something? Asking for iterative/vscode-dvc#3222 + #9025

dvc/commands/plots.py Outdated Show resolved Hide resolved
@skshetry
Copy link
Member Author

Just give me a heads-up if this is going to get released.

Since this is a breaking change for VSCode, I am fine with waiting.

Will it still be possible to make plots diff fail with an error or will it always return something? Asking for iterative/vscode-dvc#3222 + #9025

#9025 might get fixed with this PR (cc @dberenbaum).
This should cover a lot of cases, but at this time, we can't guarantee that it'll always return a JSON output, please do check for the exit code.
I don't want to try .. catch everything, so I think it's better to handle errors when it fails on a case-by-case basis.

dvc should handle expected failures well, but loudly fail in other cases.

@dberenbaum
Copy link
Collaborator

#9025 might get fixed with this PR (cc @dberenbaum). This should cover a lot of cases, but at this time, we can't guarantee that it'll always return a JSON output, please do check for the exit code.

Tried this out and noted an issue I found in #9188. However, it does look like it should close #9025 since that specific error is now handled by this PR.

@skshetry
Copy link
Member Author

@dberenbaum, FYI we changed the format of JSON output from our last discussion to the one @mattseddon suggested in #9146 (comment).

@mattseddon mattseddon mentioned this pull request Mar 15, 2023
2 tasks
@mattseddon
Copy link
Member

@skshetry will the errors key be optional or will it always be provided (and empty if there are no errors)? If I get a choice can we make it optional?

@skshetry
Copy link
Member Author

skshetry commented Mar 16, 2023

Same with 'data’ key when there are no plots.

@mattseddon
Copy link
Member

mattseddon commented Mar 16, 2023

I am trying to add this branch into the virtual env for vscode-dvc-demo. Is there a "simple" way for me to resolve this:

ERROR: Cannot install -r requirements.txt (line 4), dvc 0.68.2.dev4532+gff11f9d7 (from git+https://github.com/skshetry/dvc.git@plots-errors-json) and dvclive==2.3.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested dvc 0.68.2.dev4532+gff11f9d7 (from git+https://github.com/skshetry/dvc.git@plots-errors-json)
    dvclive 2.3.1 depends on dvc>2.45.1
    The user requested dvc 0.68.2.dev4532+gff11f9d7 (from git+https://github.com/skshetry/dvc.git@plots-errors-json)
    dvclive 2.3.1 depends on dvc>2.45.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Requirements for the demo project are:

git+https://github.com/skshetry/dvc.git@plots-errors-json
torch==1.13.1
torchvision==0.14.1
dvclive==2.3.1

edit: I can just pip install the branch for now pip install git+https://github.com/skshetry/dvc.git@plots-errors-json

@skshetry
Copy link
Member Author

Can you try something like this:

git+https://github.com/iterative/dvc.git@refs/pull/9146/head

dvc/render/match.py Outdated Show resolved Hide resolved
dps, rev_props = converter.flat_datapoints(rev)
try:
dps, rev_props = converter.flat_datapoints(rev)
except Exception as e: # noqa: BLE001, pylint: disable=broad-except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be broad?
flat_datapoints should only raise DvcException, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to match what we do on error_handler. I see flat_datapoints() reraising DvcException on some instances.

The question is, is there a potential to raise other errors like OSError, etc? And looking at the image.flat_datapoints(), it might raise other exceptions. So I don't want this to fail in any case.

dvc/render/match.py Outdated Show resolved Hide resolved
@mattseddon
Copy link
Member

mattseddon commented Mar 21, 2023

@skshetry can you confirm that for images name & source will have the same value for error entries? Will they both always be the path to the image or can they diverge? I don't want to make another bad assumption.

Please and thank you.

@skshetry
Copy link
Member Author

can you confirm that for images name & source will have the same value for error entries? Will they both always be the path to the image or can they diverge?

yes, they will always have same value. The name is a plot id which happens to be the same as a filename, source is filename.

@mattseddon
Copy link
Member

@skshetry thanks for being patient. This change provides what is needed for the extension 🙏🏻.

@skshetry skshetry merged commit 43195a8 into iterative:main Mar 27, 2023
@skshetry skshetry deleted the plots-errors-json branch March 27, 2023 07:08
@mattseddon
Copy link
Member

mattseddon commented Mar 28, 2023

@skshetry

I did find one issue with this. If the dvc.yaml from a revision is completely broken (fails validation) then nothing will be returned for that revision in the output.

Examples

workspace dvc.yaml is broken:

❯ dvc plots diff workspace HEAD -o .dvc/tmp/plots --split --json > output.json
DVC failed to load some plots for following revisions: 'workspace'.        

output.json contains no errors key.

HEAD dvc.yaml is broken:

❯ dvc plots diff workspace HEAD -o .dvc/tmp/plots --split --json > output.json
DVC failed to load some plots for following revisions: 'HEAD'.        

output.json contains no errors key.

HEAD and workspace are both broken:

❯ dvc plots diff workspace HEAD -o .dvc/tmp/plots --split --json > output.json
DVC failed to load some plots for following revisions: 'workspace, HEAD'.        

output.json is an empty dict.

Can you provide an error for the failed revision stating that it's dvc.yaml failed validation?

Do you want me to raise a follow-up issue/reopen one or is this enough?

Sorry for not finding this sooner.

@skshetry
Copy link
Member Author

@mattseddon, that's expected. DVC tries to find definitions in older commits and tries to use that.

At the moment, we are only returning errors tied to specific plots. I should have noted this down somewhere, sorry.

I worry that returning these errors may be noisy or false-positive. What do you think?

@mattseddon
Copy link
Member

@mattseddon, that's expected. DVC tries to find definitions in older commits and tries to use that.

At the moment, we are only returning errors tied to specific plots. I should have noted this down somewhere, sorry.

👍🏻

I worry that returning these errors may be noisy or false-positive. What do you think?

The problem that I have is that when placed into the larger context this behaviour looks strange. When there is an error in the workspace:

dvc data status => throws an error
dvc exp show => shows error for workspace record
dvc plots diff => loads the previous revision's definitions & templates

I think if there is a failure while loading a dvc.yaml on the LHS then the command should fail loudly, not try to fall back.

Here is a demo of what the extension looks like with a dvc.yaml validation error in the workspace:

Screen.Recording.2023-03-28.at.4.08.13.pm.mov

Same thing with a definition error:

Screen.Recording.2023-03-28.at.4.11.35.pm.mov

Does that make any sense to you?

@skshetry
Copy link
Member Author

I think if there is a failure while loading a dvc.yaml on the LHS then the command should fail loudly, not try to fall back.

That's exactly the discussion in #9025.

@skshetry
Copy link
Member Author

@mattseddon, what would you prefer here? For the behaviour change, you'd have to wait for the discussions to get resolved and prioritized. For now, I think we can return all the errors from all revisions (or only from the workspace/lhs side).

@mattseddon
Copy link
Member

@mattseddon, what would you prefer here? For the behaviour change, you'd have to wait for the discussions to get resolved and prioritized. For now, I think we can return all the errors from all revisions (or only from the workspace/lhs side).

I think providing all errors would be best but can you give me some example output for a simple project under the scenarios in #9146 (comment)?

@skshetry
Copy link
Member Author

skshetry commented Mar 28, 2023

I think it'll look similar to above (without name):

{
  "rev": "rev"
  "type": "YAMLSyntaxError",
  "msg": "unable to read: 'dvc.yaml', YAML file structure is corrupted"
}

@mattseddon
Copy link
Member

I think it'll look similar to above (without name):

{
  "revision": "rev"
  "type": "YAMLSyntaxError",
  "msg": "unable to read: 'dvc.yaml', YAML file structure is corrupted"
}

Ok, I should be able to make that work. Please LMK when there is a draft available for me to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plots diff: don't make one plot break all the others plots: return error messages for failed plots
5 participants